{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "UxDZW841tkSi"
   },
   "source": [
    "# Tutorial III: Handwritten digit recognition"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "_Nha4NG_tkSl"
   },
   "source": [
    "<p>\n",
    "Bern Winter School on Machine Learning, 27-31 January 2020<br>\n",
    "Prepared by Mykhailo Vladymyrov.\n",
    "</p>\n",
    "\n",
    "This work is licensed under a <a href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "HjuW0SKStkSm"
   },
   "source": [
    "In this session we will create a fully-connected neural network to perform handwritten digit recognition using TensorFlow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Dit404ghtkSr"
   },
   "source": [
    "## 1. Load necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "0fcfiHcmuAPA"
   },
   "outputs": [],
   "source": [
    "# if using google colab\n",
    "%tensorflow_version 2.x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "YbRY_dIutkSr"
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "import shutil\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import IPython.display as ipyd\n",
    "import tensorflow.compat.v1 as tf\n",
    "import tensorflow.keras.datasets.mnist as mnist\n",
    "tf.disable_v2_behavior()\n",
    "\n",
    "# We'll tell matplotlib to inline any drawn figures like so:\n",
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "\n",
    "from IPython.core.display import HTML\n",
    "HTML(\"\"\"<style> .rendered_html code { \n",
    "    padding: 2px 5px;\n",
    "    color: #0000aa;\n",
    "    background-color: #cccccc;\n",
    "} </style>\"\"\")\n",
    "%load_ext tensorboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def show_graph(g=None):\n",
    "    if os.path.exists('logs'):\n",
    "        shutil.rmtree('logs')\n",
    "    summary_writer = tf.summary.FileWriter('./logs', graph=g if g else tf.get_default_graph())\n",
    "    %tensorboard --logdir logs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "VaMSPkKJtkSt"
   },
   "source": [
    "### 1.1. A bit of things"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "WcPIsppKtkSu"
   },
   "source": [
    "The training as we saw in in last section is done iteratively, adjusting the model parameters on each step.\n",
    "\n",
    "We perform optimization several times for all traininng dataset. Going through all this dataset is refered to as 'epoch'.\n",
    "\n",
    "When we do training its usually done in two loops. In outer loop we iterate over all epochs. For each epoch we usually split the dataset into small chuncks, 'mini-batches'. Optimization is then performed on each minibatch.\n",
    "\n",
    "It is important that data doesn't go to the training pipeline in same order. So the overall scheme looks like this (pseudocode):\n",
    "\n",
    "\n",
    "```\n",
    "x,y = get_training_data()\n",
    "for epoch in range(number_epochs):\n",
    "   x_shfl,y_shfl = shuffle(x,y)\n",
    "   \n",
    "   for mb_idx in range(number_minibatches_in_batch):\n",
    "       x_mb,y_mb = get_minibatch(x_shfl,y_shfl, mb_idx)\n",
    "       \n",
    "       optimize_on(data=x_mb, labels=y_mb)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "bTwTRW92tkSu"
   },
   "source": [
    "Shuffling can be easily done using permuted indexes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "9y5IoSaEtkSv"
   },
   "outputs": [],
   "source": [
    "#some array\n",
    "arr = np.array([110,111,112,113,114,115,116])\n",
    "\n",
    "#we can get sub-array for a set of indexes, eg:\n",
    "idx_1_3 = [1,3]\n",
    "sub_arr_1_3 = arr[idx_1_3]\n",
    "print (arr,'[',idx_1_3,']','->', sub_arr_1_3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "JdnYF8lQtkSw"
   },
   "outputs": [],
   "source": [
    "ordered_idx = np.arange(7)\n",
    "permuteded_idx = np.random.permutation(7)\n",
    "print(ordered_idx)\n",
    "print(permuteded_idx)\n",
    "\n",
    "permuted_arr = arr[permuteded_idx]\n",
    "print (arr,'[',permuteded_idx,']','->', permuted_arr)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "2GA-bFOMtkSy"
   },
   "source": [
    "Some additional np things in this seminar:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "NhimaptitkSz"
   },
   "outputs": [],
   "source": [
    "#index of element with highest value\n",
    "np.argmax(permuted_arr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Tb0wl9mFtkS0"
   },
   "outputs": [],
   "source": [
    "arr2d = np.array([[0,1],[2,3]])\n",
    "print(arr2d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "x6OOEX3BtkS2"
   },
   "outputs": [],
   "source": [
    "#flatten\n",
    "arr_flat = arr2d.flatten()\n",
    "#reshape\n",
    "arr_4 = arr2d.reshape((4))\n",
    "arr_4_1 = arr2d.reshape((4,1))\n",
    "\n",
    "print (arr_flat)\n",
    "print (arr_4)\n",
    "print (arr_4_1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "rcuT4JYRtkS4"
   },
   "source": [
    "## 2. Load the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "JrbUAUlrtkS4"
   },
   "source": [
    "First we will load the data: 60000 training images and 10000 images for validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "L33vRL03tkS7"
   },
   "outputs": [],
   "source": [
    "(x_train_2d, y_train), (x_test_2d, y_test) = mnist.load_data()\n",
    "x_train_2d = x_train_2d/255.0\n",
    "x_test_2d = x_test_2d/255.0\n",
    "\n",
    "\n",
    "print ('train: data shape', x_train_2d.shape, 'label shape', y_train.shape)\n",
    "print ('test: data shape', x_test_2d.shape, 'label shape', y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "c1eXOuZOtkS8"
   },
   "source": [
    "Each image is a 28x28 pixels. For this model we will interppret it as a 1D array of 784 elements.\n",
    "Additionally we will use the labels a in so-called one hot encoding: each label is a vector of length 10, with all elements equal to 0, except, corresponding to the number written in the image. Let's take a look:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "-JouvebGFSzj"
   },
   "outputs": [],
   "source": [
    "n_train = x_train_2d.shape[0]\n",
    "n_test = x_test_2d.shape[0]\n",
    "\n",
    "x_train = x_train_2d.reshape([n_train, -1])\n",
    "x_test = x_test_2d.reshape([n_test, -1])\n",
    "\n",
    "y_train_1_hot = np.zeros((n_train, y_train.max()+1))\n",
    "y_train_1_hot[np.arange(n_train),y_train] = 1\n",
    "\n",
    "y_test_1_hot = np.zeros((n_test, y_test.max()+1))\n",
    "y_test_1_hot[np.arange(n_test), y_test] = 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "QRO6zv6atkS_"
   },
   "outputs": [],
   "source": [
    "img = x_train_2d[0]\n",
    "lbl = y_train[0]\n",
    "lbl_1_hot = y_train_1_hot[0]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "gBbmxF94p8Ve"
   },
   "source": [
    "Let's check the images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "DrN9qvs3tkTA"
   },
   "outputs": [],
   "source": [
    "plt.imshow(img, cmap='gray', interpolation='nearest')\n",
    "plt.grid(False)\n",
    "print('one-hot label:',lbl_1_hot, '. Actual label:', lbl )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "-d-oTmh3tkTE"
   },
   "outputs": [],
   "source": [
    "fig, axs = plt.subplots(5, 5, figsize=(10,10))\n",
    "for idx, im in enumerate(x_train_2d[0:25]):\n",
    "    y_idx = idx // 5\n",
    "    x_idx = idx % 5\n",
    "    axs[y_idx][x_idx].imshow(im, cmap='gray', interpolation='nearest')\n",
    "    axs[y_idx][x_idx].grid(False)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YzpETsFotkTF"
   },
   "source": [
    "## 3. Bulding blocks of a neural network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "57Y2c3l8tkTG"
   },
   "source": [
    "Neural network consists of layers of neurons. Each neuron perfroms 2 operations.\n",
    "1. Calculate the linear transformation of the input vector $\\bar x$: \n",
    "$$z = \\bar W \\cdot \\bar x + b = \\sum {W_i x_i} + b$$ where $\\bar W$ is vector of weights and $b$ - bias.\n",
    "2. Perform the nonlinear transformation of the result using activation function $f$ $$y = f(z)$$ Here we will use rectified linear unit activation in all but last layers.\n",
    "\n",
    "In a fully connected neural network each layer is a set of N neurons, performing different transformations of all the same layer's inputs $\\bar x = [x_i]$ producing output vector $\\bar y = [y_j]_{i=1..N}$: $$y_j = f(\\bar W_j \\cdot \\bar x + b_j)$$\n",
    "\n",
    "Since output of each layer forms input of next layer, one can write for layer $l$: $$x^l_j = f(\\bar W^l_j \\cdot \\bar x^{l-1} + b^l_j)$$ where $\\bar x^0$ is network's input vactor.\n",
    "\n",
    "To simplify building the network, we'll define a helper function, creating neuron layer with given number of outputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": [
     0
    ],
    "colab": {},
    "colab_type": "code",
    "id": "_-6EYYNctkTG"
   },
   "outputs": [],
   "source": [
    "def fully_connected_layer(x, n_output, name=None, activation=None):\n",
    "    \"\"\"Fully connected layer.\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    x : tf.Tensor\n",
    "        Input tensor to connect\n",
    "    n_output : int\n",
    "        Number of output neurons\n",
    "    name : None, optional\n",
    "        TF Scope to apply\n",
    "    activation : None, optional\n",
    "        Non-linear activation function\n",
    "\n",
    "    Returns\n",
    "    -------\n",
    "    h, W : tf.Tensor, tf.Tensor\n",
    "        Output of the fully connected layer and the weight matrix\n",
    "    \"\"\"\n",
    "    if len(x.get_shape()) != 2:\n",
    "        x = flatten(x, reuse=None)\n",
    "\n",
    "    n_input = x.get_shape().as_list()[1]\n",
    "\n",
    "    with tf.variable_scope(name or \"fc\", reuse=None):\n",
    "        W = tf.get_variable(\n",
    "            name='W',\n",
    "            shape=[n_input, n_output],\n",
    "            dtype=tf.float32,\n",
    "            initializer=tf.initializers.glorot_uniform())\n",
    "\n",
    "        b = tf.get_variable(\n",
    "            name='b',\n",
    "            shape=[n_output],\n",
    "            dtype=tf.float32,\n",
    "            initializer=tf.initializers.constant(0.0))\n",
    "\n",
    "        h = tf.nn.bias_add(\n",
    "            name='h',\n",
    "            value=tf.matmul(x, W),\n",
    "            bias=b)\n",
    "\n",
    "        if activation:\n",
    "            h = activation(h)\n",
    "\n",
    "        return h, W"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "yq4u6F41tkTI"
   },
   "source": [
    "In the case of classification, in the the last layer we use *softmax* transformation as non-linear transformation: $$y_i = \\sigma(\\bar z)_i = \\frac{ e^{z_i}}{\\sum_j e^{z_j}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "KLUWcu7-tkTJ"
   },
   "source": [
    "This will correspond to the one-hot labels that we use.\n",
    "Finally we will use the cross entropy between output $y$ and the ground truth (GT) $y_{GT}$ as the loss function: $$H(y, y_{GT}) = - \\sum_i y_{GT, i} \\log(y_{i})$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "agJ3gkg0tkTJ"
   },
   "source": [
    "## 4. Bulding a neural network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "FfJEnLTmtkTK"
   },
   "source": [
    "Number of inputs for neurons will be given by input data, i.e. image, size. Output - by number of classes, 10 in our case."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "LC1SpGLbtkTL"
   },
   "outputs": [],
   "source": [
    "n_input = x_train.shape[1]\n",
    "n_output = y_train_1_hot.shape[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "G7SF4JZgtkTP"
   },
   "source": [
    "Let's first build a most simple network, with just 1 layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "YlVOlJ6VtkTP"
   },
   "outputs": [],
   "source": [
    "g = tf.Graph()\n",
    "with g.as_default():\n",
    "    X = tf.placeholder(name='X', dtype=tf.float32, shape=[None, n_input])\n",
    "    Y = tf.placeholder(name='Y', dtype=tf.float32, shape=[None, n_output])\n",
    "    \n",
    "    # 1 layer: 784 inputs -> 10 outputs\n",
    "    # no (i.e. linear) activation\n",
    "    L1, W1 = fully_connected_layer(X , 10, 'L1')\n",
    "    # softmax activation\n",
    "    Y_onehot = tf.nn.softmax(L1, name='Logits')\n",
    "    \n",
    "    # prediction: onehot->integer\n",
    "    Y_pred = tf.argmax(Y_onehot, axis=1, name='YPred')\n",
    "    \n",
    "    # cross_entropy = tf.reduce_mean(-tf.reduce_sum(Y * tf.log(Y_onehot), reduction_indices=[1]))\n",
    "    # here we use numerically stable implementation of the same:\n",
    "    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=L1, labels=Y)\n",
    "    optimizer = tf.train.AdamOptimizer(learning_rate=0.003).minimize(cross_entropy)\n",
    "    \n",
    "    # get fraction of correctly assigned labels\n",
    "    Y_true = tf.argmax(Y, 1)\n",
    "    Correct = tf.equal(Y_true, Y_pred, name='CorrectY')\n",
    "    Accuracy = tf.reduce_mean(tf.cast(Correct, dtype=tf.float32), name='Accuracy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "6HIikZnSJhzT"
   },
   "source": [
    "We will train for 5 epochs, with minibatches of size 64. This is very similar to what we did in last session: split all data in minibatches, run the optimizer, for each epoch store the training and validadtion  accuracy for plotting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "code_folding": [],
    "colab": {},
    "colab_type": "code",
    "id": "KP4ve-rDtkTS"
   },
   "outputs": [],
   "source": [
    "with tf.Session(graph=g) as sess:\n",
    "    acc_val = []\n",
    "    acc_trn = []\n",
    "    \n",
    "    sess.run(tf.global_variables_initializer())\n",
    "\n",
    "    # Now actually do some training:\n",
    "    mini_batch_size = 64\n",
    "    n_epochs = 5\n",
    "\n",
    "    for epoch_i in range(n_epochs):\n",
    "        # iterate minibatches\n",
    "        idx = np.random.permutation(n_train)\n",
    "        for mb_idx in range(n_train // mini_batch_size):\n",
    "            sub_idx = idx[mini_batch_size * mb_idx : mini_batch_size * (mb_idx+1)]\n",
    "            x_batch, y_batch = x_train[sub_idx], y_train_1_hot[sub_idx]\n",
    "\n",
    "            sess.run(optimizer, feed_dict={\n",
    "                X: x_batch,\n",
    "                Y: y_batch\n",
    "            })\n",
    "            \n",
    "\n",
    "        acr_v = sess.run(Accuracy,\n",
    "                         feed_dict={\n",
    "                             X: x_test,\n",
    "                             Y: y_test_1_hot\n",
    "                             })\n",
    "        acr_t = sess.run(Accuracy,\n",
    "                         feed_dict={\n",
    "                             X: x_train,\n",
    "                             Y: y_train_1_hot\n",
    "                             })\n",
    "        print(acr_t, acr_v)\n",
    "        \n",
    "        acc_val.append(acr_v)\n",
    "        acc_trn.append(acr_t)\n",
    "\n",
    "    # final test accuracy:\n",
    "    corr, accr = sess.run((Correct, Accuracy),\n",
    "                          feed_dict={\n",
    "                              X: x_test,\n",
    "                              Y: y_test_1_hot\n",
    "                              })\n",
    "    \n",
    "    \n",
    "    # get index of first incorrectly recognize digit and display it\n",
    "    wrong_idx  = [i for i,c in enumerate(corr) if c == False]\n",
    "    wrong_idx0 = wrong_idx[0]\n",
    "    wrong0_lbl = sess.run(Y_pred,\n",
    "                   feed_dict={\n",
    "                       X: x_test[wrong_idx0:wrong_idx0+1],\n",
    "                       Y: y_test_1_hot[wrong_idx0:wrong_idx0+1]\n",
    "                   })[0]\n",
    "    \n",
    "    #store final value of the W1\n",
    "    W1_res = sess.run(W1)\n",
    "    \n",
    "    fig, axs = plt.subplots(1, 2, figsize=(10,5))\n",
    "    axs[0].plot(acc_trn)\n",
    "    axs[0].plot(acc_val)\n",
    "    axs[0].legend(('training accuracy', 'validation accuracy'), loc='lower right')\n",
    "    axs[1].imshow(x_test_2d[wrong_idx0], cmap='gray', interpolation='nearest')\n",
    "    axs[1].grid(False)\n",
    "    plt.show()\n",
    "    print('found label:',wrong0_lbl, 'true label:', y_test[wrong_idx0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "WgYoROCLrBQW"
   },
   "source": [
    "The learned model parameters W1 are a matrix of weights that show importance of each input pixel (784) for each of the 10 outputs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "i-nnCph8rU01"
   },
   "outputs": [],
   "source": [
    "print(W1_res.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "i02IbJ2gtkTT"
   },
   "source": [
    "Let's visualize the trained weights:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "L371n9COtkTU"
   },
   "outputs": [],
   "source": [
    "W1_res = W1_res.reshape(28,28,10)\n",
    "_, axs = plt.subplots(1, 10, figsize=(13,5))\n",
    "for i in range(10):\n",
    "    axs[i].imshow(W1_res[..., i], cmap='plasma', interpolation='nearest')\n",
    "    axs[i].grid(False)\n",
    "    axs[i].axis('off')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Wuo23m_ytkTW"
   },
   "source": [
    "Here we classify images into 10 classes. But think of it: does the network know, or need to know that those were images? For the network each image is just a 784 values. And it finds that there is a patten in those!\n",
    "\n",
    "Same way one can feed any other bunch of numbers, and the network will try it's best to fugure out a relation pannern between those."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "cgHlUGwDtkTW"
   },
   "source": [
    "## 5. Excercise 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Y4IKSsj9tkTX"
   },
   "source": [
    "Build a network with two layers, first with `tf.nn.relu` ReLU activation and 1500 neurons and second one with 10 and softmax activation. Start with `learning_rate` of 0.001 and find optimal value."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "fKx1wnrRtkTX"
   },
   "outputs": [],
   "source": [
    "... #1. create graph\n",
    "with g.as_default():\n",
    "    X = tf.placeholder(name='X', dtype=tf.float32, shape=[None, n_input])\n",
    "    Y = ... #2. create labels placeholder\n",
    "    \n",
    "    L1, W1 = ... #3. first layer: takes X as input, 1500, relu\n",
    "    L2, W2 = ... #4. second layer, 10 neurons, NO ACTIVATION HERE\n",
    "    \n",
    "    Y_onehot = tf.nn.softmax(L2, name='Logits')\n",
    "    Y_pred = tf.argmax(Y_onehot, axis=1, name='YPred')\n",
    "    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=L2, labels=Y)\n",
    "    optimizer = ... #5. create optimizer\n",
    "    \n",
    "    Y_true = tf.argmax(Y, 1)\n",
    "    Correct = ... #6. correct labels mask and accuracy as mean value of mask\n",
    "    Accuracy = ...\n",
    "    \n",
    "    # for visualization we wil calculate the gradients\n",
    "    #gradients of each of 10 outputs over input x\n",
    "    yh_grad = [tf.gradients(Y_onehot[..., i], X) for i in range(10)]\n",
    "    #gradient of maximal output over input x\n",
    "    ym_grad = tf.gradients(tf.reduce_max(Y_onehot, axis=1), X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "m4V1-wkRtkTZ",
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "with tf.Session(graph=g) as sess:\n",
    "    ... #7. initialize global variables\n",
    "    \n",
    "    acc_val = []\n",
    "    acc_trn = []\n",
    "\n",
    "    mini_batch_size = 64\n",
    "    n_epochs = 5\n",
    "    for epoch_i in range(n_epochs):\n",
    "        # iterate minibatches\n",
    "        idx = np.random.permutation(n_train)\n",
    "        for mb_idx in range(n_train // mini_batch_size):\n",
    "            sub_idx = idx[mini_batch_size * mb_idx : mini_batch_size * (mb_idx+1)]\n",
    "            x_batch, y_batch = x_train[sub_idx], y_train_1_hot[sub_idx]\n",
    "\n",
    "            ... #8. run the optimizer\n",
    "        \n",
    "        acr_v = sess.run(Accuracy,\n",
    "                       feed_dict={\n",
    "                           X: x_test,\n",
    "                           Y: y_test_1_hot\n",
    "                       })\n",
    "        acr_t = sess.run(Accuracy,\n",
    "                       feed_dict={\n",
    "                           X: x_train,\n",
    "                           Y: y_train\n",
    "                       })\n",
    "        print(acr_t, acr_v)\n",
    "        acc_trn.append(acr_t)\n",
    "        acc_val.append(acr_v)\n",
    "        \n",
    "    #save also the gradients:\n",
    "    yh_grad_res, ym_grad_res = sess.run((yh_grad, ym_grad), feed_dict={X: Xs_test})\n",
    "    \n",
    "\n",
    "    # Print final test accuracy:\n",
    "    print('final test accuracy: ', sess.run(Accuracy,\n",
    "                   feed_dict={\n",
    "                       X: x_test,\n",
    "                       Y: y_test_1_hot\n",
    "                   }))\n",
    "    \n",
    "    plt.plot(acc_trn)\n",
    "    plt.plot(acc_val)\n",
    "    plt.legend(('training accuracy', 'validation accuracy'), loc='lower right')\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hh6I4wJLtkTa"
   },
   "source": [
    "## 6. Gradients visualization"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "1ujfuAkVtkTb"
   },
   "source": [
    "We will display several images, and corresponding gradients of maximal output activation, as well as all activations. This might help better understand how our network processes the imput data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "fGP4fdkJtkTc"
   },
   "outputs": [],
   "source": [
    "grad = np.asarray(yh_grad_res)\n",
    "grad = grad.reshape(grad.shape[0], grad.shape[2], 28,28)\n",
    "gradm = np.asarray(ym_grad_res[0])\n",
    "gradm = gradm.reshape(gradm.shape[0], 28,28)\n",
    "\n",
    "n_img_d = 10\n",
    "_, axs = plt.subplots(n_img_d, 12, figsize=(15,17./12*n_img_d))\n",
    "for i in range(n_img_d):\n",
    "    axs[i, 0].imshow(x_test_2d[i], cmap='gray', interpolation='nearest')\n",
    "    axs[i, 0].set_title(y_test[i])\n",
    "    axs[i, 0].grid(False)\n",
    "    axs[i, 0].axis('off')\n",
    "\n",
    "    axs[i, 1].imshow(gradm[i], cmap='seismic', interpolation='nearest')\n",
    "    axs[i, 1].set_title('max grad')\n",
    "    axs[i, 1].grid(False)\n",
    "    axs[i, 1].axis('off')\n",
    "\n",
    "    gmin = np.min(grad[:, i, ...])\n",
    "    gmax = np.max(grad[:, i, ...])\n",
    "    for j in range(10):\n",
    "        axs[i,j+2].set_title(str(j))\n",
    "        axs[i,j+2].imshow(grad[j, i], cmap='plasma', interpolation='nearest', vmin = gmin, vmax = gmax)\n",
    "        axs[i,j+2].grid(False)\n",
    "        axs[i,j+2].axis('off')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kcLUPxNwtkTd"
   },
   "source": [
    "## 7. Excercise 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7BWuZAD3tkTf"
   },
   "source": [
    "Build the network with 3 or more layers. Try to get test accuracy >98.5%.\n",
    "Better to copy and modify the previous code than modyfy that one: then you can compare results.\n",
    "Visualize your graph with `show_graph(my_cute_graph)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "JCJ1i_botkTg"
   },
   "outputs": [],
   "source": [
    "g = tf.Graph()\n",
    "......\n",
    ".....\n",
    "..."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "collapsed_sections": [],
   "name": "Tutorial III.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}